pycoQC Usage Notebook

Example files

pycoQC repository contains several example sequencing summary files generated with various version of Albacore and Guppy. Each of those files only contains 10,000 reads.

  • ./docs/data/Albacore-1.2.1_basecall-1D-DNA_small_sequencing_summary.txt.gz
  • ./docs/data/Albacore-1.2.3_basecall-1D-RNA_small_sequencing_summary.txt.gz
  • ./docs/data/Albacore-1.7.0_basecall-1D-DNA_small_sequencing_summary.txt.gz
  • ./docs/data/Albacore-2.1.10_basecall-1D-DNA_small_sequencing_summary.txt.gz
  • ./docs/data/Albacore-2.1.10_basecall-1D-RNA_small_sequencing_summary.txt.gz
  • ./docs/data/Albacore-2.3.1_basecall-1D-RNA_small_sequencing_summary.txt.gz
  • ./docs/data/Guppy-2.1.3_basecall-1D-RNA_small_sequencing_summary.txt.gz
  • ./docs/data/Guppy-2.1.3_basecall-1D-barcoding_DNA_small_sequencing_summary.txt.gz

On top of these summary files for Guppy the barcode information are now stored in a separate barcoding summary file. There is one example in pycoQC:

  • ./docs/data/Guppy-2.1.3_basecall-1D-barcoding_DNA_small_barcoding_summary.txt.gz

Using pycoCQ

General information

pycoQC is a simple class that is initialized with a text summary file generated by ONT Albacore or Guppy. For 1D run use the file named sequencing_summary.txt available the root of Albacore output directory. For 1D2, use sequencing_1dsq_summary.txt that cam be found in the 1dsq_analysis directory.

The instantiated object can be subsequently called with various methods that will generates tables and plots.

There are a few different ways to get help for all the public package functions:

  • In a separate window with the jupyter magic "?": ?pycoQC.channels_activity
  • In an output cell with the standard help function: help (pycoQC.channels_activity)
  • Inline with the cursor on the function of interest use shift + tab

Imports

For plotly offline plotting

Import pycoQC main class as well as Plotly and enable inline plotting in the current Notebook.

This is the recommended option. This ensures that your all your data are stored inside the notebook.

The limitation is that if generating many plots with large datasets the notebook will become quite heavy and slow.

In [12]:
# Run cell with Ctrl + Enter
from pycoQC.pycoQC import pycoQC
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode (connected=False)

For plotly online plotting

This option takes advantage of Plotly web-service for hosting graphs. This requires to set up an account (https://plot.ly/python/getting-started/#initialization-for-online-plotting) and to provide credentials in the notebook. This could be a good option for easy sharing of the interactive plots generated by pycoQC.

In [13]:
# Only run this cell if you have set up a plotly account before and wants to use Plotly web-service 
# from plotly.plotly import plot, iplot
# import plotly.tools as pt
# pt.set_credentials_file (username="XXXXXXXXXX", api_key="XXXXXXXXXX")

Initialisation

Upon initialization pycoQC reads the sequencing summary file, runs a series of tests and pre-process the data for plotting methods.

Sequencing_summary file

PycoQC can read compressed sequencing_summary.txt files (‘gzip’, ‘bz2’, ‘zip’, ‘xz’). Instead of a single file, it is also possible to pass a UNIX style regex to match multiple files

Depending on the run type and the version of Albacore used some informations might not be available. In particular calibration reads were not flagged in early versions of Albacore. When the field is available those reads are automatically discarded. Similarly barcodes information are only available in multiplexed runs.

Run type

The type of run (1D or 1D2) is automatically detected but can be explicitly enforced with run_type if needed

Run ID reordering

There are often several runids in a single sequencing_summary file. Unfortunately there are no ways to know the correct order based on the information contained in the sequencing_summary.txt file alone. By default pycoQC will automatically reorder the runs by decreasing throughput, which should normally reflect the sequencing order. However if you know the order you can specify it at initialisation with the option runid_list. This option can also be used to select specific run IDs

Minimal "pass" quality

By default pycoQC assumes that the minimal mean quality for a "pass" read is 7 (same as default Albacore value). However if you want to adjust the value, you can specify it at initialisation with min_pass_qual.

In [14]:
help (pycoQC.__init__)
Help on function __init__ in module pycoQC.pycoQC:

__init__(self, seq_summary_file, barcode_summary_file=None, runid_list=[], min_pass_qual=7, filter_calibration=False, verbose_level=0)
    Parse Albacore sequencing_summary.txt file and clean-up the data
    * seq_summary_file: STR
        Path to the sequencing_summary generated by Albacore 1.0.0 + (read_fast5_basecaller.py) / Guppy 2.1.3+ (guppy_basecaller).
        One can also pass a UNIX style regex to match multiple files with glob https://docs.python.org/3.6/library/glob.html
    * barcode_summary_file: STR
        Path to the barcode_summary_file generated by Guppy 2.1.3+ (guppy_barcoder). This is not a required file.
        One can also pass a UNIX style regex to match multiple files with glob https://docs.python.org/3.6/library/glob.html
    * runid_list: LIST of STR [Default []]
        Select only specific runids to be analysed. Can also be used to force pycoQC to order the runids for
        temporal plots, if the sequencing_summary file contain several sucessive runs. By default pycoQC analyses
        all the runids in the file and uses the runid order as defined in the file.
    * filter_calibration BOOL [Default False]
        If True read flagged as calibration strand by the software are removed
    * min_pass_qual INT [Default 7]
        Minimum quality to consider a read as 'pass'
    * verbose_level INT [Default 0]
        Level of verbosity, from 2 (Chatty) to 0 (Nothing)

Basic initialisation

In [15]:
# Run cell with Ctrl + Enter
p = pycoQC("./data/Albacore-1.7.0_basecall-1D-DNA_small_sequencing_summary.txt.gz")
print (p)
[pycoQC]
	Total reads: 9,562
	Pass reads: 8,042
	Minimal Pass Quality: 7
	Run Duration: 44.78 h
	Total Bases: 14,664,334
	Barcode found: True

Initialisation with summary file regex and maximum verbose level

In [16]:
p = pycoQC("./data/*RNA*", verbose_level=2)
Import raw data from sequencing summary files
	Sequencing summary files found: ['./data/Guppy-2.1.3_basecall-1D-RNA_small_sequencing_summary.txt.gz', './data/Albacore-1.2.3_basecall-1D-RNA_small_sequencing_summary.txt.gz', './data/Albacore-2.1.10_basecall-1D-RNA_small_sequencing_summary.txt.gz', './data/Albacore-2.3.1_basecall-1D-RNA_small_sequencing_summary.txt.gz']
	40,000 reads found in initial file
Verify fields and discard unused columns
	1D Run type
	Columns found: ['read_id', 'run_id', 'channel', 'start_time', 'sequence_length_template', 'mean_qscore_template']
Drop lines containing NA values
	0 reads discarded
Filter out zero length reads
	813 reads discarded
Sort run IDs by decreasing throughput
	Run-id order ['7ae4f0a6d2b7ba3e0248496b7de9cd5d1c028415', '5074e0cd71f372314c30ca5158aab2172d915023', '9835d20f1d205bdbd1fb4d464ae778de95beab24', 'c675730269f2f96f300f1cfa613fe89c53b344c3', '2b9163100702bba6ac29d37dbc96ccad740aa05d', 'd0054681152930b21276405d948b115e46968ca6', '71055637dd56eca9416305332eba1ed37bbfffe1', 'db5916f2fe7957afac1d0aaccdec883342c4bc31', '93fa1ad3ebc8a6e505d991bcb052c2b8ceb278b5', '17b317b994031430f350cda1dc13a72f66572ece']
	Reorder runids
	Processing reads with Run_ID 7ae4f0a6d2b7ba3e0248496b7de9cd5d1c028415 / time offset: 0
	Processing reads with Run_ID 5074e0cd71f372314c30ca5158aab2172d915023 / time offset: 5309.74734
	Processing reads with Run_ID 9835d20f1d205bdbd1fb4d464ae778de95beab24 / time offset: 15911.26726
	Processing reads with Run_ID c675730269f2f96f300f1cfa613fe89c53b344c3 / time offset: 183649.42351
	Processing reads with Run_ID 2b9163100702bba6ac29d37dbc96ccad740aa05d / time offset: 184044.468
	Processing reads with Run_ID d0054681152930b21276405d948b115e46968ca6 / time offset: 184432.95738
	Processing reads with Run_ID 71055637dd56eca9416305332eba1ed37bbfffe1 / time offset: 184828.68148
	Processing reads with Run_ID db5916f2fe7957afac1d0aaccdec883342c4bc31 / time offset: 229024.07989
	Processing reads with Run_ID 93fa1ad3ebc8a6e505d991bcb052c2b8ceb278b5 / time offset: 401812.72839
	Processing reads with Run_ID 17b317b994031430f350cda1dc13a72f66572ece / time offset: 402176.51159
Reindex dataframe by read_ids
[pycoQC]
	Total reads: 39,187
	Pass reads: 31,661
	Minimal Pass Quality: 7
	Run Duration: 146.21 h
	Total Bases: 41,323,794
	Barcode found: False

Generating plots and tables

Interaction with Plotly library

Plots are generated with plotly for Python and return a plotly Figure object that can be used by users for:

  • Further customization using the numerous methods attached to the Figure object
  • Inline plotting in Jupyter Notebook using iplot (either from plotly.plotly or plotly.offline)
  • Generating a separate HTML file with plot (either from plotly.plotly or plotly.offline)
  • Exporting as a static image (https://plot.ly/python/static-image-export/), pdf (https://plot.ly/python/pdf-reports/) or various text formats.

In this notebook we will use the inline plotting option with the offline plotly library

Users can also customize the figures online in a user friendly environment by clicking on "Edit in Chart Studio" in the upper right corner of each figures.

Similarly static pictures can be exported using the "Download plot as a png" button.

Common arguments

All the methods have the arguments width and height that can be used to customize the plotting area. In general we do not recommend modifing these values as they might disrupt the plot layout.

Most of the methods also have the argument sample. By default pycoQC downsample the number of reads to 100,000 before plotting. This drastically reduces the processing time for large dataset and has a very limited impact on the plot aspect. The sampling is random but deterministic, meaning that you should always obtain the same results for the same dataset. The value can be changed to increase or decrease the number of reads. Alternatively, one can deactivate the behavior by specifying sample=False.

Overall data summary

The summary method generate a simple summary table with a clickable button to switch from "all reads" to "pass reads" only

In [17]:
help(pycoQC.summary)
Help on function summary in module pycoQC.pycoQC:

summary(self, width=None, height=None, plot_title='Run summary')
    Plot an interactive summary table
    * width: With of the ploting area in pixel
    * height: height of the ploting area in pixel

In [18]:
# Run cell with Ctrl + Enter
p = pycoQC("./data/*RNA_small_sequencing_summary.txt.gz")
fig = p.summary()
iplot (fig, show_link=False)

Read Length and Mean quality distribution

pycoQC has 3 methods to visualize the distribution of mean quality scores and of estimated read length:

  • reads_len_1D: An histogram of estimated read length in logarithmic scale
  • reads_qual_1D: An histogram of mean quality scores
  • reads_len_qual_2D: A density contour plot of estimated read length vs mean quality scores in semilog scale

Although we recommend to stick to default values, all 3 methods allow users to customize the plots.

  • The numbers of bin to divide the reads quality and/or length space in can be specified with nbins for the 1D plots and len_nbins / qual_nbins for the 2D plot
  • The intensity of line smoothing (using a gaussian kernel filter) can be specified
  • Additional cosmetic customization are available: color/colorscale
In [19]:
help(pycoQC.reads_len_1D)
Help on function reads_len_1D in module pycoQC.pycoQC:

reads_len_1D(self, color='lightsteelblue', width=None, height=500, nbins=200, smooth_sigma=2, sample=100000, plot_title='Distribution of read length')
    Plot a distribution of read length (log scale)
    * color: Color of the area (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * width: With of the ploting area in pixel
    * height: height of the ploting area in pixel
    * nbins: Number of bins to devide the x axis in
    * smooth_sigma: standard deviation for Gaussian kernel
    * sample: If given, a n number of reads will be randomly selected instead of the entire dataset

In [20]:
# Run cell with Ctrl + Enter
p = pycoQC("./data/Albacore-2.1.10_basecall-1D-RNA_small_sequencing_summary.txt.gz")
fig = p.reads_len_1D()
iplot(fig, show_link=False)
In [21]:
help(pycoQC.reads_qual_1D)
Help on function reads_qual_1D in module pycoQC.pycoQC:

reads_qual_1D(self, color='salmon', width=None, height=500, nbins=200, smooth_sigma=2, sample=100000, plot_title='Distribution of read quality scores')
    Plot a distribution of quality scores
    * color: Color of the area (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * width: With of the ploting area in pixel
    * height: height of the ploting area in pixel
    * nbins: Number of bins to devide the x axis in
    * smooth_sigma: standard deviation for Gaussian kernel
    * sample: If given, a n number of reads will be randomly selected instead of the entire dataset

In [22]:
# Run cell with Ctrl + Enter
p = pycoQC("./data/Albacore-2.1.10_basecall-1D-RNA_small_sequencing_summary.txt.gz")
fig = p.reads_qual_1D()
iplot(fig, show_link=False)
In [23]:
help(pycoQC.reads_len_qual_2D)
Help on function reads_len_qual_2D in module pycoQC.pycoQC:

reads_len_qual_2D(self, colorscale=[[0.0, 'rgba(255,255,255,0)'], [0.1, 'rgba(255,150,0,0)'], [0.25, 'rgb(255,100,0)'], [0.5, 'rgb(200,0,0)'], [0.75, 'rgb(120,0,0)'], [1.0, 'rgb(70,0,0)']], width=None, height=600, len_nbins=200, qual_nbins=75, smooth_sigma=2, sample=100000, plot_title='Mean read quality per sequence length')
    Plot a 2D distribution of quality scores vs length of the reads
    * colorscale: a valid plotly color scale https://plot.ly/python/colorscales/ (Not recommanded to change)
    * width: With of the ploting area in pixel
    * height: height of the ploting area in pixel
    * len_nbins: Number of bins to divide the read length values in (x axis)
    * qual_nbins: Number of bins to divide the read quality values in (y axis)
    * smooth_sigma: standard deviation for 2D Gaussian kernel
    * sample: If given, a n number of reads will be randomly selected instead of the entire dataset

In [24]:
# Run cell with Ctrl + Enter
p = pycoQC("./data/*RNA*")
fig = p.reads_len_qual_2D ()
iplot(fig, show_link=False)

Sequencing output quality and length over experiment time

pycoQC can generate plot showing the evolution of the sequencing output (output_over_time), the mean read quality (qual_over_time) and the read length (len_over_time) over the course of the sequencing run.

Please be aware that if there are multiple run IDs in the source file(s), pycoQC reorder the run IDS by decreasing throughput/second as explained in Initialisation. This means that the over_time plots could be wrong, particularly when mixing several runs together.

For both functions the argument smooth_sigma can be used to modulate the smoothing factor of the gaussian filter, if you are not satisfied with the default result.

The colors of both plots can be fully customised:

  • cumulative_color and interval_color for output_over_time
  • median_color, quartile_color and extreme_color for quality_over_time
In [25]:
help(pycoQC.output_over_time)
Help on function output_over_time in module pycoQC.pycoQC:

output_over_time(self, cumulative_color='rgb(204,226,255)', interval_color='rgb(102,168,255)', width=None, height=500, time_bins=500, sample=100000, plot_title='Output over experiment time')
    Plot a yield over time
    * cumulative_color: Color of cumulative yield area (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * interval_color: Color of interval yield line (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * width: With of the ploting area in pixel
    * height: height of the ploting area in pixel
    * time_bins: Number of bins to divide the time values in (x axis)
    * sample: If given, a n number of reads will be randomly selected instead of the entire dataset

In [26]:
# Run cell with Ctrl + Enter
p  = pycoQC ("./data/Albacore-1.2.1_basecall-1D-DNA_small_sequencing_summary.txt.gz")
fig = p.output_over_time ()
iplot(fig, show_link=False)
In [27]:
help (pycoQC.qual_over_time)
Help on function qual_over_time in module pycoQC.pycoQC:

qual_over_time(self, median_color='rgb(250,128,114)', quartile_color='rgb(250,170,160)', extreme_color='rgba(250,170,160,0.5)', smooth_sigma=1, width=None, height=500, time_bins=500, sample=100000, plot_title='Read quality over experiment time')
    Plot a mean quality over time
    * median_color: Color of median line color (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * quartile_color: Color of inter quartile area and lines (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * extreme_color:: Color of inter extreme area and lines (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-col
    * smooth_sigma: sigma parameter for the Gaussian filter line smoothing
    * width: With of the ploting area in pixel
    * height: height of the ploting area in pixel
    * time_bins: Number of bins to divide the time values in (x axis)
    * sample: If given, a n number of reads will be randomly selected instead of the entire dataset

In [28]:
# Run cell with Ctrl + Enter
p  = pycoQC ("./data/Albacore-2.1.10_basecall-1D-DNA_small_sequencing_summary.txt.gz")
fig = p.qual_over_time ()
iplot(fig, show_link=False)
In [29]:
help (pycoQC.len_over_time)
Help on function len_over_time in module pycoQC.pycoQC:

len_over_time(self, median_color='rgb(102,168,255)', quartile_color='rgb(153,197,255)', extreme_color='rgba(153,197,255,0.5)', smooth_sigma=1, width=None, height=500, time_bins=500, sample=100000, plot_title='Read length over experiment time')
    Plot a read length over time
    * median_color: Color of median line color (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * quartile_color: Color of inter quartile area and lines (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * extreme_color:: Color of inter extreme area and lines (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-col
    * smooth_sigma: sigma parameter for the Gaussian filter line smoothing
    * width: With of the ploting area in pixel
    * height: height of the ploting area in pixel
    * time_bins: Number of bins to divide the time values in (x axis)
    * sample: If given, a n number of reads will be randomly selected instead of the entire dataset

In [30]:
# Run cell with Ctrl + Enter
p  = pycoQC ("./data/Albacore-2.1.10_basecall-1D-DNA_small_sequencing_summary.txt.gz")
fig = p.len_over_time ()
iplot(fig, show_link=False)

Barcode distribution

When barcoding information is available, it is possible to generate a pie chart of the barcode count distribution. If no barcode information is available pycoQC throws an error.

It is not rare to have non-relevant barcodes detected at very low level. By default any barcode below 0.1% of the reads is excludes from the plot, but this can be changed with min_percent_barcode.

Similar to the previously described methods colors are customisable with colors.

In [31]:
help(pycoQC.barcode_counts)
Help on function barcode_counts in module pycoQC.pycoQC:

barcode_counts(self, min_percent_barcode=0.1, colors=['#f8bc9c', '#f6e9a1', '#f5f8f2', '#92d9f5', '#4f97ba'], width=None, height=500, plot_title='Percentage of reads per barcode')
    Plot a mean quality over time
    * min_percent_barcode: minimal percentage od total reads for a barcode to be reported
    * colors: List of colors (hex, rgb, rgba, hsl, hsv or any CSV named colors https://www.w3.org/TR/css-color-3/#svg-color
    * width: With of the ploting area in pixel
    * height: height of the ploting area in pixel

Albacore output example

In [32]:
# Run cell with Ctrl + Enter
p  = pycoQC ("./data/Albacore-1.2.3_basecall-1D-RNA_small_sequencing_summary.txt.gz")
fig = p.barcode_counts ()
iplot(fig, show_link=False)

Guppy output example

In [33]:
# Run cell with Ctrl + Enter
p  = pycoQC (
    seq_summary_file="./data/Guppy-2.1.3_basecall-1D-barcoding_DNA_small_sequencing_summary.txt.gz",
    barcode_summary_file="./data/Guppy-2.1.3_basecall-1D-barcoding_DNA_small_barcoding_summary.txt.gz")
fig = p.barcode_counts ()
iplot(fig, show_link=False)

Channels activity over time

Although the flowcell layout could be visually attractive (see https://github.com/mattloose/flowcellvis) this is not very informative on how the channels generate data during the run.

The channels_activity method generates a heatmap style plot showing the output over time per channel.

The number of channels can be changed to match Minion flowcells (512 default) or Promethion flowcells (3000).

The argument smooth_sigma can be used to modulate the smoothing factor of the gaussian smoothing filter

Colors can be changed with colorscale

In [34]:
help(pycoQC.channels_activity)
Help on function channels_activity in module pycoQC.pycoQC:

channels_activity(self, colorscale=[[0.0, 'rgba(255,255,255,0)'], [0.01, 'rgb(255,255,200)'], [0.25, 'rgb(255,200,0)'], [0.5, 'rgb(200,0,0)'], [0.75, 'rgb(120,0,0)'], [1.0, 'rgb(0,0,0)']], smooth_sigma=1, width=None, height=600, time_bins=150, sample=100000, plot_title='Output per channel over experiment time')
    Plot a yield over time
    * colorscale: a valid plotly color scale https://plot.ly/python/colorscales/ (Not recommanded to change)
    * smooth_sigma: sigma parameter for the Gaussian filter line smoothing
    * width: With of the ploting area in pixel
    * height: Height of the ploting area in pixel
    * time_bins: Number of bins to divide the time values in (y axis)
    * sample: If given, a n number of reads will be randomly selected instead of the entire dataset

In [35]:
# Run cell with Ctrl + Enter
p  = pycoQC ("./data/Albacore-1.2.1_basecall-1D-DNA_small_sequencing_summary.txt.gz")
fig = p.channels_activity ()
iplot(fig, show_link=False)

Genarate a sequencing summary file from fast5 file

pycoQC comes with a small utility tool to generate a sequencing summary file when it is not available (say your genomic facility doesn't keep it).

The program can also attempt to extract additional information including the file path (include_path) corresponding to each read and the following fields:

  • mean_qscore_template
  • sequence_length_template
  • called_events
  • skip_prob
  • stay_prob
  • step_prob
  • strand_score
  • read_id
  • start_time
  • duration
  • start_mux
  • read_number
  • channel
  • channel_digitisation
  • channel_offset
  • channel_range
  • channel_sampling
  • run_id
  • sample_id
  • device_id
  • protocol_run
  • flow_cell
  • calibration_strand
  • calibration_strand
  • calibration_strand
  • calibration_strand
  • barcode_arrangement
  • barcode_full
  • barcode_score

If a field in not found or invalid it is simply ignored for the current fast5 file.

Multiprocessing is supported to speed up the data extraction (threads)

If generated with the minimal default fields, the file is compatible with pycoQC.

In [36]:
from pycoQC.Fast5_to_seq_summary import Fast5_to_seq_summary
In [37]:
Fast5_to_seq_summary (fast5_dir="./data/fast5/", seq_summary_fn="./data/fast5/summary_sequencing.tsv", threads=6, verbose_level=1, fields=["mean_qscore_template", "called_events", "duration"])
!head {"./data/fast5/summary_sequencing.tsv"}
Check input data and options
Start processing fast5 files
22 reads [00:00, 2072.47 reads/s]
Overall counts 	valid files: 22
fields found 	mean_qscore_template: 22	called_events: 22	duration: 22
fields not found 
Total reads: 22 / Average speed: 445.21 reads/s

mean_qscore_template	called_events	duration
7.608	1615	24233
8.544	3740	56107
8.304	1547	23218
8.219	2080	31208
8.325	3846	57697
8.206	1649	24747
8.23	2778	51387
8.124	2978	44675
7.337	1972	29589
In [38]:
Fast5_to_seq_summary (fast5_dir="./data/fast5/", seq_summary_fn="./data/fast5/summary_sequencing.tsv", threads=6, verbose_level=1, include_path=True)
!head {"./data/fast5/summary_sequencing.tsv"}
Check input data and options
Start processing fast5 files
22 reads [00:00, 7873.27 reads/s]
Overall counts 	valid files: 22
fields found 	read_id: 22	run_id: 22	channel: 22	start_time: 22	sequence_length_template: 22	mean_qscore_template: 22	calibration_strand_genome_template: 22
fields not found 	barcode_arrangement: 22
Total reads: 22 / Average speed: 734.1 reads/s

read_id	run_id	channel	start_time	sequence_length_template	mean_qscore_template	calibration_strand_genome_template	path
2c32553e-62c6-4c7a-bf05-249771364f04	40ebe55356ada6c830fa793745ef4c498d896c73	237	11	1151	8.544	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_237_strand.fast5
e6a8e4d0-7b3c-471a-be26-fa7857d12663	40ebe55356ada6c830fa793745ef4c498d896c73	318	15	392	8.304	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_318_strand.fast5
f8325de9-a77e-4616-a4a8-69ecf32e1688	40ebe55356ada6c830fa793745ef4c498d896c73	354	16	568	8.206	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_354_strand.fast5
3e81c32a-f2ee-4719-a88d-e0affe93d26f	40ebe55356ada6c830fa793745ef4c498d896c73	348	24	1137	8.124	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_348_strand.fast5
68804104-71dc-465c-b82d-3a99a4689701	40ebe55356ada6c830fa793745ef4c498d896c73	38	20	1010	8.325	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_38_strand.fast5
3784283c-47cc-48ac-8d7b-7efd32123b56	40ebe55356ada6c830fa793745ef4c498d896c73	243	20	893	8.54	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_243_strand.fast5
9a1c5296-2ab1-4abd-8d50-e059754cf332	40ebe55356ada6c830fa793745ef4c498d896c73	319	33	1235	8.119	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_319_strand.fast5
5b7fadd0-c646-4c7b-9800-66ee658a5ca8	40ebe55356ada6c830fa793745ef4c498d896c73	150	37	468	7.608	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_150_strand.fast5
6af04302-04c8-4d8d-8e87-aa69178b3f24	40ebe55356ada6c830fa793745ef4c498d896c73	36	26	832	8.234	filtered_out	/home/aleg/Programming/pycoQC/docs/data/fast5/20180625_FAH77625_MN23126_sequencing_run_S1_57529_read_10_ch_36_strand.fast5